Spencer Parr Thesis Analysis

Info

Study Overview:
The primary objective of this study is to assess the impact of Clionid sponges on coral reef ecosystems in the U.S. Virgin Islands. Coral reefs are critical for biodiversity and marine life, but have faced significant declines due to environmental stressors, including climate change, ocean acidification, and disease outbreaks. Clionid sponges, as bioeroders, contribute to coral degradation by breaking down the calcium carbonate structure of corals, accelerating reef decline. By examining the prevalence and interactions of Cliona sponges with coral over time, this research aims to understand how these sponges influence reef health and resilience, particularly in response to environmental disturbances such as bleaching events and hurricanes.

This study utilizes long-term data collected by the Territorial Coral Reef Monitoring Program (TCRMP) from 2009 to 2023 across 34 reef sites in the USVI. The research focuses on three main questions: (1) how Cliona cover and coral interactions have changed over time, (2) how shifts in coral abundance affect Cliona-coral interactions, and (3) whether Cliona presence correlates with coral health, particularly disease prevalence. Through statistical modeling, including generalized linear models (GLMs) and linear mixed-effects models (LMEMs), the study will analyze the relationship between Cliona prevalence and environmental variables such as temperature, water quality, and coral species. This analysis is intended to reveal patterns of Cliona recruitment on stressed or diseased corals and to identify species-specific vulnerabilities, providing insights into the adaptive mechanisms of Cliona and guiding reef conservation strategies in light of changing ocean conditions.

Code Explanation:
The R scripts provided manage and manipulate TCRMP data, preparing it for statistical analysis. Key steps include filtering Cliona species data, calculating prevalence, and integrating environmental factors. Models such as Linear Mixed-Effects Models (LMEMs) and Generalized Linear Models (GLMs) are applied to assess Cliona recruitment patterns, coral interactions, and environmental influences.

Spencer Parr ## Metadata

Coral Health Data:
Coral health data is the base structure of the the TCRMP data collection.

Cliona Prevalence- Coral Health Data

This code explains the data management/manipulation and graphical evidence Cliona prevalence for each TCRMP location

  1. The data is read into R using read_csv, and then selected columns are extracted, specifically: Period, SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, and Spo2. This selection reduces the dataset to only the columns relevant to this analysis.

  2. Next, a new column called presence is created to indicate whether there is evidence of Cliona presence on each sample. The mutate function is used along with if_else to assign a value of 1 to presence if any of the following conditions are met: the Cliona value is greater than zero, Spo1ID or Spo2ID columns contain either “CLSP” or “BOSP”. If none of these conditions are met, presence is set to 0. After this, presence values are checked for any missing data, and replace_na is used to fill any NA values with 0, ensuring that presence is either 1 or 0 without any missing values.

  3. The code then applies several filters to refine the dataset. First, it filters for rows where Period is either “Annual” or “PostBL”, SampleType is either “Permanent” or “permanent”, and Method is either “intercept” or “50 cm belt”. This step ensures that only specific types of samples are kept for the analysis. Further, rows where Transect contains the letter “A” are removed, ensuring that only certain transect data is included. Finally, the code filters for rows where SampleYear is 2005 or later, focusing on more recent data.

  4. In the last step, additional transformations are applied. A new ID column is created to provide a unique identifier for each row, generated as a sequence from 1 to the number of rows in the filtered dataset. The SampleYear, Location, and Transect columns are converted to factor data types using as.factor to prepare them for categorical analysis. This finalizes the dataset with all necessary transformations, making it ready for further analysis or modeling.

#1
cliona <- read_csv("../Cliona.csv")%>%
  select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
  #2
  mutate(presence=if_else(Cliona>0 | Spo1ID %in% c("CLSP", "BOSP") | Spo2ID %in% c("CLSP", "BOSP"),1,0))%>%
  mutate(presence=replace_na(presence,0))%>%
  #3
  filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
  filter(!grepl("A",Transect))%>%
  filter(SampleYear>=2005)%>%
  #4
  mutate(ID=1:nrow(.),
         SampleYear=as.factor(SampleYear),
         Location=as.factor(Location),
         Transect=as.factor(Transect))
  1. This R code calculates Cliona sponge prevalence across transects by grouping the cliona dataset by Location, SampleYear, and Transect. Within each group, it calculates three metrics: prev (the average presence of Cliona, indicating prevalence), freq (total occurrences of Cliona), and total (total observations in that group). After ungrouping the data, it adds a unique ID to each row. The resulting transectprev dataset provides summarized prevalence data for each unique combination of location, year, and transect.
transectprev <- cliona %>%
  group_by(Location, SampleYear, Transect) %>%
  summarise(prev = mean(presence), freq = sum(presence), total = n())%>%
  ungroup()%>%
  mutate(ID= 1:nrow(.))
  1. This code creates a summary data frame, location_means, showing the average Cliona sponge prevalence for each location. It groups the transectprev data by Location and then calculates the mean of prev (prevalence) within each group, saving the result as mean_prev. The resulting location_means data frame has each location’s name and its corresponding average Cliona prevalence.
location_means <- transectprev %>%
  group_by(Location) %>%
  summarise(mean_prev = mean(prev))
  1. This code generates a plot that visualizes the prevalence of Cliona sponges at different reef sites by creating a boxplot for each location. The ggplot function uses the transectprev dataset, which contains Cliona prevalence data, and arranges the locations on the x-axis in descending order of prevalence. Each boxplot represents the range and distribution of Cliona prevalence values at that location. To enhance the visualization, the code includes a stat_summary layer that adds a green diamond marker at the mean prevalence for each location, making it easy to compare average Cliona prevalence across sites.

    ggplot(transectprev, aes(x = reorder(Location, -prev), y = prev)) +
      geom_boxplot(fill = "lightblue") +
      stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkgreen") +
      labs(title = "Prevalence of Cliona Sponges by Location (with Mean Prevalence Marked)",
           x = "Location",
           y = "Prevalence") +
      theme_minimal() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

This plot is important because it reveals substantial variability in Cliona prevalence across different reef sites. Locations such as Salt River West, Brewers Bay, and Savana show significantly higher prevalence levels, as indicated by both the wider range and higher position of the boxplots and mean markers, compared to other sites with much lower prevalence. This variability suggests that certain sites may be more susceptible to Cliona sponge colonization, potentially due to environmental conditions or other factors specific to those areas. Identifying these high-prevalence sites is crucial for targeted reef management and conservation strategies, as it highlights areas where Cliona might contribute more heavily to coral degradation.

  1. This code produces a set of line graphs showing the prevalence of Cliona sponges over time, broken down by transect and location. Each location has its own panel (using facet_wrap), and within each panel, different transects are represented by distinct lines and colors. The x-axis represents the sample year, while the y-axis shows the prevalence of Cliona sponges. The lines connect the prevalence data points over time for each transect, while points mark each specific measurement. The scale_x_continuous function formats the x-axis labels to show each unique year in the dataset, improving clarity.
# Convert SampleYear to numeric, if appropriate
transectprev$SampleYear <- as.numeric(transectprev$SampleYear)

# Then plot with the updated variable
ggplot(transectprev, aes(x = SampleYear, y = prev, color = factor(Transect), group = Transect)) +
  geom_line() +
  geom_point() +
  facet_wrap(~Location, scales = "free_y") +
  labs(title = "Prevalence Over Years by Transect",
       x = "Year",
       y = "Prevalence",
       color = "Transect") +
  scale_x_continuous(breaks = unique(transectprev$SampleYear), labels = unique(transectprev$SampleYear)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))

This graph is valuable because it allows for a detailed temporal analysis of Cliona prevalence trends at multiple scales: across years, transects, and locations. By disaggregating the data by transect and location, the plot can reveal patterns, such as consistent increases or decreases in prevalence, or fluctuations in response to specific events or environmental changes. It’s particularly useful for identifying which transects or locations show higher susceptibility to Cliona over time, offering insights into possible environmental or ecological factors driving these patterns. This information is essential for reef management, as it helps target specific areas or transects that might need more frequent monitoring or interventions to mitigate Cliona spread.

Spencer Parr

Cliona Extent

This code explains the data management/manipulation and graphical evidence Cliona extent for each TCRMP location

  1. The data is read into R using read_csv, and then selected columns are extracted, specifically: Period, SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, and Spo2. This selection reduces the dataset to only the columns relevant to this analysis.

  2. the code generates two new columns, CLSP and BOSP, which indicate coverage values for two types of sponges. Using the mutate and case_when functions, it examines the columns Spo1ID and Spo2ID to populate these new columns. If Spo1ID contains “CLSP”, then the value of Spo1 is assigned to the CLSP column. Similarly, if Spo2ID contains “CLSP”, then the value from Spo2 is assigned to CLSP. The same process is used to populate the BOSP column, using “BOSP” as the indicator. If neither condition is met in either case, the value is set to NA.

  3. The code then filters the data based on several criteria. First, it keeps only rows where Period is either “Annual” or “PostBL” and SampleType is either “Permanent” or “permanent”, ensuring consistency in sample type naming. Also, only rows where Method is either “intercept” or “50 cm belt” are retained, narrowing the dataset to specific sampling methods. And rows with certain letters in the Transect column (“A”, “R”, or “L”) are excluded using a filtering condition with grepl, and only data from 2005 onward (SampleYear >= 2005) is kept.

  4. Finally, the data is reshaped from a wide format to a long format using pivot_longer. This transformation takes the columns CLSP, BOSP, and Cliona and combines them into two new columns: SpongeType and Extent. The SpongeType column identifies which type of sponge coverage is being recorded (either CLSP, BOSP, or Cliona), and Extent contains the corresponding value of coverage for each type.

#1. 
clionaext <- read_csv("../Cliona.csv")%>%
  select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
  
#2.
  mutate( CLSP = case_when( Spo1ID == "CLSP" ~ Spo1, Spo2ID == "CLSP" ~ Spo2, TRUE ~ NA_real_),
    BOSP = case_when( Spo1ID == "BOSP" ~ Spo1, Spo2ID == "BOSP" ~ Spo2,
      TRUE ~ NA_real_)) %>%
  select(-Spo1, -Spo1ID, -Spo2, -Spo2ID)%>%
  
#3.
  filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
  filter(!grepl("A", "R", "L",Transect))%>%
  filter(SampleYear>=2005)%>%
  mutate(ID=1:nrow(.),
         SampleYear=as.factor(SampleYear),
         Location=as.factor(Location),
         Transect=as.factor(Transect))%>%
  
#4. 
  pivot_longer(cols = c(CLSP, BOSP, Cliona), 
               names_to = "SpongeType", 
               values_to = "Extent")

For the first analysis, I ask the question: Has the extent of Cliona sponge coverage increased or decreased over time? Is there a pattern in prevalence by year?

  1. This code assigns the clionaext data frame to a new data frame named ext_time. It then updates ext_time by converting the SampleYear column from a factor to a numeric data type, ensuring that the year values are treated as continuous numbers rather than categorical labels.

    ext_time<-clionaext
    
    ext_time <- ext_time %>%
      mutate(SampleYear = as.numeric(as.character(SampleYear)))
  2. This code processes the clionaext data frame to calculate the total number of unique corals recorded in each year. First, it groups the data by SampleYear and uses n_distinct(ID) to count distinct ID values, representing unique coral observations. The resulting data frame, unique_coral_counts, contains SampleYear and total_unique_corals.

    Next, the SampleYear column is converted from a factor to numeric using mutate. This ensures the year values are treated as continuous numbers for analyses or visualizations that require numerical formatting.

unique_coral_counts <- clionaext %>%
  group_by(SampleYear) %>%
  summarise(total_unique_corals = n_distinct(ID)) 

unique_coral_counts <- unique_coral_counts%>%
  mutate(SampleYear = as.numeric(as.character(SampleYear)))
  1. This code creates a new data frame, ext_time_with_counts, by merging (left_join) the ext_time data frame with the unique_coral_counts data frame. The join is performed using the SampleYear column, which is common to both data frames.
ext_time_with_counts <- ext_time %>%
  left_join(unique_coral_counts, by = "SampleYear")
  1. This code processes the ext_time_with_counts data frame to summarize sponge coverage by year. It groups the data by SampleYear and calculates the total extent of sponge coverage (total_extent), retrieves the total unique coral count for each year (count_observations), and computes the weighted average extent (weighted_avg_extent) by dividing the total extent by the unique coral count.
weighted_summary <- ext_time_with_counts %>%
  group_by(SampleYear) %>%
  summarise(
    total_extent = sum(Extent, na.rm = TRUE),
    count_observations = first(total_unique_corals),  # Use `first` to avoid altering the count
    weighted_avg_extent = total_extent / count_observations
  )
  1. This code creates a line plot using the weighted_summary data frame to visualize the weighted average extent of Cliona sponge coverage over time. It maps SampleYear to the x-axis and weighted_avg_extent to the y-axis, adding a blue line and points to highlight the data, along with a minimal theme and descriptive labels for the title, x-axis, and y-axis.
ggplot(weighted_summary, aes(x = SampleYear, y = weighted_avg_extent)) +
  geom_line(color = "blue") +
  geom_point(size = 2) +
  labs(
    title = "Weighted Average Cliona Sponge Coverage Over Time",
    x = "Year",
    y = "Weighted Average Extent of Coverage"
  ) +
  theme_minimal()

Next I ask the question: How does the extent of coverage differ between coral species each year? Are there species that experience fluctuating or consistent coverage levels over time?

  1. This code creates a summary data frame, spp_summary, by grouping the ext_time_with_counts data frame by SampleYear and SPP (coral species). For each group, it calculates the total sponge coverage extent (total_extent), retrieves the total unique coral count for the year (count_observations), and computes the weighted average sponge extent (weighted_avg_extent) by dividing the total extent by the unique coral count. This provides a standardized metric of sponge coverage for each coral species across years.
spp_summary <- ext_time_with_counts %>%
  group_by(SampleYear, SPP) %>%
  summarise(
    total_extent = sum(Extent, na.rm = TRUE),
    count_observations = first(total_unique_corals),
    weighted_avg_extent = total_extent / count_observations
  )
  1. This code creates a line plot using the spp_summary data frame to visualize the weighted average extent of Cliona sponge coverage over time for each coral species (SPP). The x-axis represents SampleYear, the y-axis represents weighted_avg_extent, and different coral species are distinguished by color.
ggplot(spp_summary, aes(x = SampleYear, y = weighted_avg_extent, color = SPP)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Weighted Average Cliona Sponge Coverage by Coral Species Over Time",
    x = "Year",
    y = "Weighted Average Extent of Coverage"
  ) +
  theme_minimal()

Spencer Parr

Substrate Preference

Ivlev’s Electivity Index

The Ivlev’s Electivity Index (Ei) is a quantitative measure used to determine an organism’s preference for certain food or habitat types relative to their availability. In this thesis, it is applied to assess the Cliona delitrix sponge’s substrate preferences on coral reefs. Specifically, the index compares the proportion of coral species colonized by Cliona to the proportion of those species available in the environment. A positive Ei value indicates a preference, while negative values suggest avoidance. This helps in understanding how Cliona sponges select coral species under different environmental conditions.

Electivity indices measure the utilization of food types (r) in relation to their abundance or availability in the environment (p). Foods that constitute a larger proportion of the diet than of the available foods are considered preferred; conversely those proportionately underrepresented in the diet are avoided.

Figure 1: Prey selectivity according to Ivlev’s selectivity index of newly emerged brown trout fry in the River Iso (NW Spain). Modified from Elías, Jacinto & Díaz et al. (2013)

While Ivlev’s Electivity Index is typically used for dietary preferences, it can still be effectively applied to Cliona delitrix sponges in this thesis, despite corals not being a food source. Cliona sponges exhibit substrate preferences for certain coral species based on factors such as calcium carbonate concentration. Some stony corals have denser calcium carbonate skeletons, which may be harder for Cliona to excavate and colonize. Thus, the sponge’s choice is driven by its ability to grow and obtain necessary nutrients, similar to dietary selectivity in food webs.

  1. Coral Abundance by Species and Location:
  • The code first calculates how many times each coral species appears at each location (site). This gives a general sense of coral distribution across different locations, providing a simple count of occurrences per species at each site.
  1. Coral Abundance by Species, Location, Transect, and Year:
  • The second part refines the calculation to be more detailed by including both transects (smaller sampling areas within locations) and sample years. This breakdown gives the number of occurrences for each species not only by location but also by transect and year, making it possible to track coral abundance more precisely across both space and time.
# Calculate coral abundance by transect and year
coral_abundance <- cliona %>%
  group_by(SPP, Location, Transect, SampleYear) %>%
  summarise(Abundance = n()) %>%
  ungroup()

clionaclean1 <- cliona %>%
  left_join(coral_abundance, by = c("SPP", "Location", "Transect", "SampleYear"))
  1. Summarizing Total Presence and Abundance:
  • The code calculates two key metrics for each coral species (SPP). The code creates a summary of each coral species, providing an overall count of how often each species was present and its total abundance across the entire dataset. This allows for a better understanding of the relative presence and prevalence of different coral species in the study area.
# Summarize data to calculate total presence and abundance of each coral species
summary_cliona <- clionaclean1 %>%
  group_by(SPP) %>%
  summarise(TotalPresence = sum(presence), 
            TotalAbundance = sum(Abundance)) %>%  # Adjust for coral abundance
  ungroup()
  • This code calculates the proportions of presence and availability for each coral species and then computes Ivlev’s Electivity Index to determine whether species are disproportionately present relative to their availability in the environment.
# Calculate proportions and then Ivlev's Electivity Index adjusted for coral abundance
 Ivlev_Index <- summary_cliona %>%
   mutate(ProportionPresence = TotalPresence / sum(TotalPresence),
         ProportionAvailability = TotalAbundance / sum(TotalAbundance),  # Use abundance
          IvlevIndex = (ProportionPresence - ProportionAvailability) /
            (ProportionPresence + ProportionAvailability))

Ivlev_Index %>%
  kable(caption = "Ivlev's Electivity Index Adjusted for Coral Abundance", digits = 4)
Ivlev’s Electivity Index Adjusted for Coral Abundance
SPP TotalPresence TotalAbundance ProportionPresence ProportionAvailability IvlevIndex
AA 72 41794 0.0378 0.0618 -0.2408
AC 0 21 0.0000 0.0000 -1.0000
AF 0 229 0.0000 0.0003 -1.0000
AG 9 1142 0.0047 0.0017 0.4736
AGSP 3 17256 0.0016 0.0255 -0.8837
AH 9 10869 0.0047 0.0161 -0.5455
AL 29 9780 0.0152 0.0145 0.0259
AP 0 4 0.0000 0.0000 -1.0000
AT 0 13 0.0000 0.0000 -1.0000
AU 0 41 0.0000 0.0001 -1.0000
CN 7 403 0.0037 0.0006 0.7210
DCY 0 60 0.0000 0.0001 -1.0000
DL 7 365 0.0037 0.0005 0.7440
DSO 1 58 0.0005 0.0001 0.7193
EF 0 232 0.0000 0.0003 -1.0000
FF 0 17 0.0000 0.0000 -1.0000
HC 0 1355 0.0000 0.0020 -1.0000
IR 0 10 0.0000 0.0000 -1.0000
IS 0 4 0.0000 0.0000 -1.0000
MAFO 0 62 0.0000 0.0001 -1.0000
MAL 0 15 0.0000 0.0000 -1.0000
MAN 0 6 0.0000 0.0000 -1.0000
MAR 0 25 0.0000 0.0000 -1.0000
MC 87 7315 0.0457 0.0108 0.6172
MD 11 2743 0.0058 0.0041 0.1751
MDA 0 3 0.0000 0.0000 -1.0000
MDSP 0 33 0.0000 0.0000 -1.0000
MF 1 24 0.0005 0.0000 0.8734
MILA 8 4438 0.0042 0.0066 -0.2193
MILC 0 702 0.0000 0.0010 -1.0000
MILSP 0 4 0.0000 0.0000 -1.0000
ML 0 9 0.0000 0.0000 -1.0000
MM 0 317 0.0000 0.0005 -1.0000
MME 0 793 0.0000 0.0012 -1.0000
MP 0 51 0.0000 0.0001 -1.0000
MYSP 0 76 0.0000 0.0001 -1.0000
OA 396 26709 0.2080 0.0395 0.6808
OFAV 81 5270 0.0425 0.0078 0.6904
OFRA 166 78250 0.0872 0.1157 -0.1406
OX 107 38927 0.0562 0.0576 -0.0120
PA 304 224707 0.1597 0.3323 -0.3509
PB 0 1 0.0000 0.0000 -1.0000
PBSP 2 1460 0.0011 0.0022 -0.3454
PC 0 116 0.0000 0.0002 -1.0000
PCL 4 91 0.0021 0.0001 0.8796
PD 0 413 0.0000 0.0006 -1.0000
PF 1 435 0.0005 0.0006 -0.1010
PP 52 107932 0.0273 0.1596 -0.7078
PS 26 1128 0.0137 0.0017 0.7823
SB 0 8 0.0000 0.0000 -1.0000
SC 0 48 0.0000 0.0001 -1.0000
SCSP 0 53 0.0000 0.0001 -1.0000
SI 76 3601 0.0399 0.0053 0.7646
SL 0 1 0.0000 0.0000 -1.0000
SR 1 5785 0.0005 0.0086 -0.8843
SS 444 81086 0.2332 0.1199 0.3209
SSPP 0 1 0.0000 0.0000 -1.0000
TC 0 1 0.0000 0.0000 -1.0000
UNK 0 1 0.0000 0.0000 -1.0000
  1. The resulting graph shows a clear separation at zero, with bars above zero (blue) indicating a preference for certain coral species and bars below zero (red) indicating avoidance. This pattern reveals that Cliona generally avoids many coral species, with only a few species showing a positive electivity index, suggesting that Cliona selectively colonizes specific corals. This insight helps identify coral species that are more vulnerable to Cliona colonization, which is important for managing coral health and understanding Cliona’s ecological impact on reef systems.
ggplot(Ivlev_Index, aes(x = reorder(SPP, IvlevIndex), y = IvlevIndex, fill = IvlevIndex > 0)) +
  geom_bar(stat = "identity", width = 0.8) +
  labs(title = "Ivlev's Electivity Index for Cliona on Different Coral Species",
       x = "Coral Species",
       y = "Ivlev's Electivity Index") +
  theme_minimal(base_size = 14) +
  scale_fill_manual(values = c("red", "steelblue"), 
                    name = "Preference",
                    labels = c("Avoidance", "Preference")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10),  # Adjust angle and font size
        plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Center and bold title
        legend.position = "top") +  # Move legend to the top
  geom_hline(yintercept = 0, linetype = "dashed", color = "black", size = 0.5)  # Add horizontal line at 0

Spencer Parr